Skip to content

arena: rename inner-SIMD-align knob and drop default 64 -> 32#552

Merged
evaleev merged 1 commit into
masterfrom
evaleev/arena/rename-and-bump-simd-align-default
May 21, 2026
Merged

arena: rename inner-SIMD-align knob and drop default 64 -> 32#552
evaleev merged 1 commit into
masterfrom
evaleev/arena/rename-and-bump-simd-align-default

Conversation

@evaleev
Copy link
Copy Markdown
Member

@evaleev evaleev commented May 21, 2026

Summary

Two related changes to the ArenaTensor in-cell alignment knob in src/TiledArray/tensor/arena_tensor.h:

  • Rename TILEDARRAY_INNER_SIMD_ALIGNTILEDARRAY_ARENATENSOR_SIMD_ALIGN (and kInnerSimdAlignkArenaTensorSimdAlign), so the knob's name reflects the type whose layout it parametrizes. Hard cut — no compat alias, since there are no external users yet.
  • Bump the default 64 B → 32 B. 32 B covers AVX2 YMM (the most common x86_64 SIMD target today) and shaves 32 B off data_offset per inner cell. AVX-512 builds that want the wider floor stay one -DTILEDARRAY_ARENATENSOR_SIMD_ALIGN=64 away.

Why this matters: each ArenaTensor cell pads from sizeof(Cell) (~14 B for btas::zb::RangeNd<>) up to this alignment before its element storage, so per-inner-cell bookkeeping is data_offset + 8 B view ptr. On a ToT tile with millions of inner cells (e.g. PNO-CCSD), the difference between 32 B and 64 B padding is order ~100s of MB of memory.

The doc comment now spells out the reasonable overrides:

  • 64 — AVX-512 ZMM (and the x86_64 cache line)
  • 16 — NEON-only / Apple Silicon (NEON has no wider register, and Apple Silicon doesn't implement SVE)
  • 128 — two-cache-line / Apple-Silicon L1-line floor (false-sharing motivation only)

Test plan

  • arena_suite, arena_kernels_suite, arena_einsum_unit_suite, arena_tot_trivial_suite, arena_sizeof_invariant_suite, arena_tensor_suite, arena_tensor_kernels_suite all pass against the new default (np=1, debug build, TA_ASSERT_POLICY=TA_ASSERT_THROW).
  • CI (np=1 + np=2, full matrix) — let CI run.
  • MPQC consumer build with TA repinned to this commit — done out-of-tree.

…4 -> 32

Rename the in-arena element-storage alignment knob to TA_ARENATENSOR_SIMD_ALIGN
(matching the TA_ prefix used by every other TA CMake option/var), and wire
it through the same CMake -> config.h.in -> header pipeline as
TA_MAX_SOO_RANK_METADATA. The previous TILEDARRAY_INNER_SIMD_ALIGN was a
header-only #ifndef/#define knob with no CMake surface; the new form is a
proper cache variable, documented in INSTALL.md.

Drop the default from 64 B to 32 B: 32 B covers AVX2 YMM loads/stores (the
most common x86_64 SIMD target today) and shaves 32 B/cell off the in-arena
padding. AVX-512 builds that want a wider floor are one
`-DTA_ARENATENSOR_SIMD_ALIGN=64` away. The doc comment and INSTALL.md entry
also call out the NEON / Apple-Silicon options (16 / 128).

No backward-compatible alias for the old macro/constant names -- there are
no external users yet.
@evaleev evaleev force-pushed the evaleev/arena/rename-and-bump-simd-align-default branch from cee791e to 3da31da Compare May 21, 2026 09:51
@evaleev evaleev merged commit 600c4ad into master May 21, 2026
9 checks passed
@evaleev evaleev deleted the evaleev/arena/rename-and-bump-simd-align-default branch May 21, 2026 14:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant